AITopics | black box variational inference

Approximating a probability density in a tractable manner is a central task in Bayesian statistics. Variational Inference (VI) is a popular technique that achieves tractability by choosing a relatively simple variational approximation. Borrowing ideas from the classic boosting framework, recent approaches attempt to \emph{boost} VI by replacing the selection of a single density with an iteratively constructed mixture of densities. In order to guarantee convergence, previous works impose stringent assumptions that require significant effort for practitioners. Specifically, they require a custom implementation of the greedy step (called the LMO) for every probabilistic model with respect to an unnatural variational family of truncated distributions.

black box variational inference, name change, variational inference, (4 more...)

Neural Information Processing Systems

Industry: Transportation > Air (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Local Expectation Gradients for Black Box Variational Inference

Neural Information Processing SystemsOct-11-2024, 09:38:54 GMT

We introduce local expectation gradients which is a general purpose stochastic variational inference algorithm for constructing stochastic gradients by sampling from the variational distribution. This algorithm divides the problem of estimating the stochastic gradients over multiple variational parameters into smaller sub-tasks so that each sub-task explores intelligently the most relevant part of the variational distribution. This is achieved by performing an exact expectation over the single random variable that most correlates with the variational parameter of interest resulting in a Rao-Blackwellized estimate that has low variance. Our method works efficiently for both continuous and discrete random variables. Furthermore, the proposed algorithm has interesting similarities with Gibbs sampling but at the same time, unlike Gibbs sampling, can be trivially parallelized.

algorithm, black box variational inference, local expectation gradient, (5 more...)

Neural Information Processing Systems

Industry: Transportation > Air (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Reviews: Boosting Black Box Variational Inference

Neural Information Processing SystemsOct-7-2024, 13:53:26 GMT

In the submission, the authors aim at developing a black-box boosting method for variational inference, which takes a family of variational distributions and finds a mixture of distribution in a given family that approximates a given posterior distribution well. The main keyword here is black-box; white-box, restricted approaches exist. In order to achieve their aim, the authors formulate a version of the Frank-Wolfe algorithm, and instantiate it with the usual KL objective of variational inference. They then derive a condition on the convergence of this instantiation that is more permissive than the usual smoothness and is based on the reformulation of the bounded curvature condition (Theorem 2). They also show how the constrained optimization problem included in the instantiation of Frank-Wolfe can be expressed in terms of a more intuitive objective, called RELBO in the submission.

algorithm, bayesian matrix factorization, black box variational inference, (9 more...)

Neural Information Processing Systems

Industry: Transportation > Air (0.88)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.59)

Add feedback

Perturbative Black Box Variational Inference

Robert Bamler, Cheng Zhang, Manfred Opper, Stephan Mandt

Neural Information Processing SystemsOct-4-2024, 03:46:35 GMT

Black box variational inference (BBVI) with reparameterization gradients triggered the exploration of divergence measures other than the Kullback-Leibler (KL) divergence, such as alpha divergences. In this paper, we view BBVI with generalized divergences as a form of estimating the marginal likelihood via biased importance sampling. The choice of divergence determines a bias-variance trade-off between the tightness of a bound on the marginal likelihood (low bias) and the variance of its gradient estimators. Drawing on variational perturbation theory of statistical physics, we use these insights to construct a family of new variational bounds. Enumerated by an odd integer order K, this family captures the standard KL bound for K = 1, and converges to the exact marginal likelihood as K . Compared to alpha-divergences, our reparameterization gradients have a lower variance. We show in experiments on Gaussian Processes and Variational Autoencoders that the new bounds are more mass covering, and that the resulting posterior covariances are closer to the true posterior and lead to higher likelihoods on held-out data.

inference, marginal likelihood, variance, (14 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Germany > Berlin (0.04)

Industry: Transportation > Air (0.62)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Variance Control for Black Box Variational Inference Using The James-Stein Estimator

Dayta, Dominic B.

arXiv.org Machine LearningMay-8-2024

Black Box Variational Inference is a promising framework in a succession of recent efforts to make Variational Inference more ``black box". However, in basic version it either fails to converge due to instability or requires some fine-tuning of the update steps prior to execution that hinder it from being completely general purpose. We propose a method for regulating its parameter updates by reframing stochastic gradient ascent as a multivariate estimation problem. We examine the properties of the James-Stein estimator as a replacement for the arithmetic mean of Monte Carlo estimates of the gradient of the evidence lower bound. The proposed method provides relatively weaker variance reduction than Rao-Blackwellization, but offers a tradeoff of being simpler and requiring no fine tuning on the part of the analyst. Performance on benchmark datasets also demonstrate a consistent performance at par or better than the Rao-Blackwellized approach in terms of model fit and time to convergence.

algorithm, estimator, gradient, (12 more...)

arXiv.org Machine Learning

2405.05485

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Transportation > Air (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Local Expectation Gradients for Black Box Variational Inference

AUEB, Michalis Titsias RC, Lázaro-Gredilla, Miguel

Neural Information Processing SystemsFeb-14-2020, 12:28:01 GMT

We introduce local expectation gradients which is a general purpose stochastic variational inference algorithm for constructing stochastic gradients by sampling from the variational distribution. This algorithm divides the problem of estimating the stochastic gradients over multiple variational parameters into smaller sub-tasks so that each sub-task explores intelligently the most relevant part of the variational distribution. This is achieved by performing an exact expectation over the single random variable that most correlates with the variational parameter of interest resulting in a Rao-Blackwellized estimate that has low variance. Our method works efficiently for both continuous and discrete random variables. Furthermore, the proposed algorithm has interesting similarities with Gibbs sampling but at the same time, unlike Gibbs sampling, can be trivially parallelized.

algorithm, black box variational inference, local expectation gradient, (5 more...)

Neural Information Processing Systems

Industry: Transportation > Air (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Boosting Black Box Variational Inference

Locatello, Francesco, Dresdner, Gideon, Khanna, Rajiv, Valera, Isabel, Raetsch, Gunnar

Neural Information Processing SystemsFeb-14-2020, 12:26:23 GMT

Approximating a probability density in a tractable manner is a central task in Bayesian statistics. Variational Inference (VI) is a popular technique that achieves tractability by choosing a relatively simple variational approximation. Borrowing ideas from the classic boosting framework, recent approaches attempt to \emph{boost} VI by replacing the selection of a single density with an iteratively constructed mixture of densities. In order to guarantee convergence, previous works impose stringent assumptions that require significant effort for practitioners. Specifically, they require a custom implementation of the greedy step (called the LMO) for every probabilistic model with respect to an unnatural variational family of truncated distributions.

black box variational inference, implementation, variational inference, (2 more...)

Neural Information Processing Systems

Industry: Transportation > Air (0.44)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.47)

Add feedback

Perturbative Black Box Variational Inference

Bamler, Robert, Zhang, Cheng, Opper, Manfred, Mandt, Stephan

arXiv.org Machine LearningJan-6-2018

Black box variational inference (BBVI) with reparameterization gradients triggered the exploration of divergence measures other than the Kullback-Leibler (KL) divergence, such as alpha divergences. In this paper, we view BBVI with generalized divergences as a form of estimating the marginal likelihood via biased importance sampling. The choice of divergence determines a bias-variance trade-off between the tightness of a bound on the marginal likelihood (low bias) and the variance of its gradient estimators. Drawing on variational perturbation theory of statistical physics, we use these insights to construct a family of new variational bounds. Enumerated by an odd integer order $K$, this family captures the standard KL bound for $K=1$, and converges to the exact marginal likelihood as $K\to\infty$. Compared to alpha-divergences, our reparameterization gradients have a lower variance. We show in experiments on Gaussian Processes and Variational Autoencoders that the new bounds are more mass covering, and that the resulting posterior covariances are closer to the true posterior and lead to higher likelihoods on held-out data.

artificial intelligence, machine learning, variance, (17 more...)

arXiv.org Machine Learning

1709.07433

Country: